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Abstract. A compressed sensing method consists of a rectangular measurement matrix, M £ 
^mxN -^^j^jj together with an associated recovery algorithm, A : E,™ — >■ E,'^. Compressed 

sensing methods aim to construct a high quality approximation to any given input vector x £ R'^ us- 
ing only Mx £ IR™ as input. In particular, we focus herein on instance optimal nonlinear approxima- 
tion error bounds for M and A of the form ||x - A{Mx.)\\p < ||x - x°p* ||^ + Cfc^/f-^/" ||x - x°P'j|^ 

for x £ R'^, where x^^' is the best possible fc-term approximation to x. 

In this paper we develop a compressed sensing method whose associated recovery algorithm. A, 
runs in O ((fc log fc) log Ai')-time, matching a lower bound up to a 0{\ogk) factor. This runtime is 
obtained by using a new class of sparse binary compressed sensing matrices of near optimal size in 
combination with sublinear-time recovery techniques motivated by sketching algorithms for high- 
volume data streams. The new class of matrices is constructed by randomly subsampling rows from 
well-chosen incoherent matrix constructions which already have a sub-linear number of rows. As 
a consequence, fewer random bits than previously required are needed in order to select the rows 
utilized by the fast reconstruction algorithms considered herein. 

1. Introduction 

Noisy group testing problems generally involve designing pooling schemes which use as few 
expensive tests as possible in order to identify a small number of important elements from a large 
universe, U, of items (see, e.g., [13]). In this setting each one of the expensive tests in question 
corresponds to observing the result of an experiment, or calculation, performed on a different subset 
of U. If each test is sufficiently sensitive to the small number of hidden items in U that must be 
identified, one might hope that testing a correspondingly small number of subsets of U in bulk 
would still allow the hidden elements to be found. Thus, designing a pooling scheme corresponds 
to choosing a good collection of subsets of U to observe so that tests performed on these subsets 
will always allow one to discover a small number of important elements hidden within U. 

Many data mining tasks can be cast in a similar framework - that is ~ as problems concerned with 
identifying a small number of interesting items from a tremendously large group without exceeding 
certain resource constraints (e.g., without using too much memory, communication power, runtime, 
etc.). Specific examples include the sketching and monitoring of heavy- hitters in high- volume 
data streams [5l[7], source localization in sensor networks |24j . and the design of high throughput 
sequencing schemes for biological specimen analysis [Tl]. Note that pooling schemes in many such 
group testing related applications naturally correspond to binary matrices (i.e., because each row 
of the binary matrix, r G {0,1}^, selects a subset for testing/observation). Furthermore, it is 
generally better for these binary binary matrices to have a small number of nonzero entries in each 
column (i.e., because this reduces the number of times each item in U must be tested/observed). 
Thus, we focus on designing sparse binary measurement matrices herein. 

Roughly speaking, one can cast many such applications as a type of compressed sensing [12] 
problem. The large set containing the small number of important elements we want to identify is 



modeled as a vector, x S R . The n entry in the vector, real number which indicates 

the "importance" of the n*'* set element (the larger the magnitude, the more important). Our 
goal is now to locate k <^ N oi the largest magnitude entries of x (i.e., the important elements). 
Unfortunately, for reasons that vary with the specific problem at hand (e.g., because only o{N) 
memory is available in the massive data stream context), we are allowed to store just m <^ N 
linear measurements of x which we must compute during a single pass over its entries. The m 
linear measurement operators are represented as a measurement matrix, M G IR*"^^. Having 
access to only Mx G IR*", we seek to identify, and then estimate, the k largest magnitude entries of 
X. This identification and estimation is performed by a sparse recovery algorithm, A : R'" — )• IR.^ , 
which (implicitly) returns a vector in having 0{k) nonzero entries. We prefer A to be fast, 
especially for applications involving massive data sets. 

In this paper we consider the design of sparse matrices M G {0, i}"*^^^ with m N, together 
with associated nonlinear functions, A : R™' — t- R''^, which have the property that A{M:x.) ~ x for 
all vectors x G R^ that are well approximated by their best fc-term approximation, 

(1) ^'k'^ ■= argmin ||x-y||20 

yG]RJ^,||y||o<fe 

More specifically, we will focus on designing (M, ^)-pairs which achieve error guarantees of the 
form 

(2) ||x - A (Mx)llp < min ||x - y\\^ + Cp,, • fc^P-V. ||x _ y|| 

for constants I < q < p < 2, and Cp^q G R"*" (e.g., see [6l [E]). We will refer to such an error 
guarantee as an ^^lp,ig" error guarantee below. 

Over the past several years this type of design problem has achieved a considerable amount of 
attention under the moniker of "compressed sensing" (e.g., see [151 [121 HJ [lOl [3l [21 [20] , aiid references 
therein). Most compressed sensing papers - this one included - generate their measurement ma- 
trices, M, randomly. This leads to two different probabilistic models in which the aforementioned 
^^ip,iq" error guarantees may hold. In the first model, a single randomly generated measurement 
matrix, M, is shown to satisfy ([2]) for all x G R^ with high probability. We will refer to this as the 
"for all" model. In the second model, a randomly generated measurement matrix is shown to satisfy 
([2|) for each given x G R^ with high probability (assuming that M is generated independently of 
x). We will refer to this second model as the "for each" model. All results proven herein are proven 
in the second, "for each", model. 

1.1. Results and Related Work. Any sparse recovery algorithm, A, that achieves either an 
"£i,£i", "^2)^i") or "^2)^2" error guarantee in the "for each" model must use an associated mea- 
surement matrix, M, having at least m > Ck\og{N /k)) rows [111 122]!^ Note that this implies an 
ri(felog(A^/A;)) lower runtime complexity bound for the recovery algorithm, A. It remains an open 
problem to prove (or disprove) the existence of a 0(/clog A^)-time recovery algorithm achieving 
any such error guarantee. In this paper we present a compressed sensing matrix/recovery algo- 
rithm pair, {M,A)^ with an "£2)^1" guarantee, where A runs in 0((A; log A;) log A^)-time — a single 
0(log A;)-factor from the known lower bound. We also present two other compressed sensing results 
which can be obtained using the same methods: one which uses an optimal number (up to constant 
factors) of randomly selected rows from an incoherent binary matrix as measurements, and another 



Here ||y||o denotes the number of nonzero entries in y G R'^, while ||y||p denotes the standard £p-norm for all 
P > 1, i.e., |!y|tp = WA^y for all y G R'^. 

will always represent an absolute constant. 
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Figure 1. Summary of previous sub-linear time results, and the results obtained herein. 

0(A; log^ A^)-time recovery result which requires fewer random bit^ than previous algorithms (i.e., 
less randomness). 

Previous work involving the development of compressed sensing methods having both sub-linear 
time reconstruction algorithms, and the type of "^p,^q" error guarantees considered herein, began 
with [9]. In [9] Cormode et al. built on streaming algorithm techniques with weaker error guarantees 
(e.g., see [5l [71 [8]) in order to develop 0(fclog^ A^)-time recovery algorithms, A, with associated 
"•^2)^2" error guarantees in the "for each" model. Similar techniques were later utilized by Gilbert 
et. al. in |16j to create sub-linear time algorithms with the same error guarantees, but whose 
associated measurement matrices, M G R'"^-'^, have a near-optimal number of rows up to constant 
factors (i.e., m = 0{klogN)). Other related compressed sensing methods with fast runtimes and 
"£2,^1" error guarantees in the "for all" model were also considered in [17]. Unlike these previous 
methods, the compressed sensing methods developed herein utilize the combinatorial properties of 
a new class of sparse binary measurement matrices formed by randomly selecting sub-matrices from 
larger incoherent matrices. 

Perhaps the measurement matrices considered herein are most similar to previous compressed 
sensing matrices based on unbalanced expander graphs (see, e.g., [21 [I8l [20]). Indeed, the mea- 
surement matrices used in this paper are created by randomly sampling rows from larger binary 
matrices that are, in fact, the adjacency matrices of a subclass of unbalanced expander graphs. 
However, unlike previous approaches which use the properties of general unbalanced expanders, 
we use different combinatorial techniques which allow us to develop 0(A; polylog A^)-time recovery 
algorithms. To the best of our knowledge, the runtimes we obtain by doing so are the best known 
for any such method having "£p,^ij" error guarantees. 

See Figure [1] for a comparison of the sub-linear time compressed sensing results proven herein 
(last two rows) with previous sub-linear time compressed sensing results discussed above (first 
three rows). The columns of Figure [1] list the following characteristics of each compressed sensing 
method: (i) the number of measurement matrix rows, m, (ii) the runtime complexity of the recovery 
algorithm, and (iii) the ^^ip,iq" error guarantee achieved by the method. All error guarantees hold 
in the "for each" model unless indicated otherwise by a /. 

1.2. Techniques and Organization. It has been shown that all binary matrices satisfying easily 
verifiable coherence condition^ have strong combinatorial properties capable of producing entirely 
deterministic compressed sensing algorithms requiring Q,(k^ log N) runtime and measurements [1]. 
In this paper we demonstrate a general means for utilizing these same types of matrices to construct 
compressed sensing approximation schemes with near-optimal runtime and sampling complexities. 

■^More precisely, the number of random bits is 0(log^ k). To the best of our knowledge this represents the first 
fast recovery result which requires a number of random bits that is entirely independent of A'^, the length of x. 

''Any matrix whose maximum inner product between all pairs of columns is small compared to the minimal number 
of ones in each column satisfies the required coherence conditions. See Section [2] for details. 
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Our new compressed sensing matrices are formed by randomly sampling a small number of rows 
from any sufficiently incoherent binary matrix. The resulting random sub-matrices are then shown 
to still satisfy sufficiently strong combinatorial properties with respect to any given input vector, x, 
in order to allow standard fast compressed sensing techniques (i.e., similar of those utilized in j9j) 
to produce accurate results. Furthermore, the theory is developed in a modular fashion, making it 
easy to utilize different binary incoherent matrix constructions in order to generate new results. We 
take advantage of this modularity in order to generate the two new results listed in Figure [H as well 
as to show that our new measurement matrices also allow for compressive sensing with an optimal 
number of measurements (up to constant factors) in 0(A^log A'^)-timeIl Each result is produced 
by utilizing a different combination of two incoherent binary matrix constructions: deterministic 
algebraic constructions due to DeVore [10], and randomly constructed incoherent binary matrices 
with fewer rows constructed below in Section [3l 

The remainder of this paper is organized as follows: In Section [2] we fix notation and review 
existing results that are needed for later sections. In Section [3] we construct incoherent binary 
matrices with a near optimal number of rows. These new binary matrices ultimately allow the 
development of our O ((/c log /c) log A^)-time recovery result via the techniques developed later in 
Section [4l Section |4] constructs our compressed sensing measurement matrices by randomly sam- 
pling rows from the previously discussed binary incoherent matrices (i.e., from both the matrices 
reviewed in Section [2] as well as the matrices constructed in Section [3]) . Our main results are then 
proven in Section \5\ Finally, we conclude with a short discussion in Section [6l 

2. Preliminaries 

Let [A^] = {0, . . . , — 1} for any G N. We consider the elements of any given x G to 
be ordered according to magnitude by the sequence j'o, ji, • • • ,jN-i so that \xjg\ > > • • • > 
I . We set = {jo , ji, • • • ,jk-i} C [N] for a given x, and let x^opt = x^''* G IR^ denote the 
associated vector with exactly k nonzero entries: 

i^k' ) . ~ ^io' i^fF ) . = ^in ■ ■ ■ > i^k' ) . ~ ^ife-i- 

All results below deal with randomly sampling rows from a rectangular binary matrix whose columns 
are all nearly pairwise orthogonal. 

Definition 1. Let K,a & [N]. An mx N matrix, M G {0, is called {K, a)-coherent if both 
of the following properties hold: 

(1) Every column of M contains at least K ones. 

(2) For all j,l G [A^] with j / /, the associated columns, M.j and M.^i G {0,1}'", have 
{M.j,M.^i) <a. 

These matrices are closely related to nonadaptive group testing matrices, unbalanced expander 
graphs, binary matrices with the restricted isometry property, and codebook design problems in 
signal processing. Several (implicit) constructions of (i^T, Q)-coherent matrices exist (e.g., the num- 
ber theoretic and algebraic constructions of ^ and [10], respectively). In addition, every {K,a)- 

coherent matrix must have Q ^min |(i^^/a^) log^^/o, A^, A^|^ rows. See [Ij for details. 

Given any binary matrix M G {0,1}™^^ with at least K G [m] ones in column n G [A^], let 
M{K, n) denote a K x N submatrix of M created by selecting the first K rows of M with nonzero 
entries in the n^^ column. The following useful fact concerning {K, a)-coherent matrices is proven 
in[l]. 



'See Theorem [S] for details. 
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Lemma 1. Suppose M is a {K^ a) -coherent matrix. Let n G [N], k E ^ ^ (Oil]; c E 

[2, oo) n N, and X E R^. If K > c ■ (ka/e) then {M{K, n) ■ x) • will be contained in the interval 



opt 



opt 



for more than ^-^ • K values of j E [A'] . 



In addition to {K, a)-coherent matrices, we will also utilize a bit-test matrix, Bn E {0, l}(^+riog2 ^l)x 
whose n*'^-column is a one followed by n E [N] written in base 2. These bit-test matrices will allow 
us to quickly identify large elements of a vector x using techniques from [9] . The row tensor product 
of two matrices, A E R'"!^^ and B E W^^''^ , denoted ^ ® ^, is defined to be the (mi ■ x N 
matrix with entries are given by 

(w4 ® B)j^ j = wAj mod mi,j ' B (i-(i mod mi)) ■ 

mi '-^ 

The following Theorem was proven in [I]. 

Theorem 1. Let e E (0, 1], /c E [i^ • ^] , and x E R^. Furthermore, suppose that M is an mx N 
binary matrix with the property that 



(3) 



(M(i^,n)-x). E 



X — X 



opt 



k 



X 



X 



opt 
(fc/e) 



k 



for more than K/2 values of j E [K] for all n E [N]. Then, there exists an algorithm that takes M 
and (M ® Bn) x as input, and outputs a vector z E R^ satisfying 



|X-Z||2 < 



X — X 



opt 



22e 



+ 



X — X 



opt 
{k/e) 



Vk 



Furthermore, the algorithm can be implemented to run in O {m log A^) time. 

Pseudocode for a faster randomized variant of the algorithm referred to by Theorem [1] can be 
found in the Appendix. This randomized variant and its associated measurement matrices are the 
focus of this paper. Briefly put, both algorithms operate in two phases. During the first phase all 
heavy entires of the input vector, x, are identified using standard bit testing techniques [9j. These 
heavy vector elements are then estimated during the second round using an approach along the 
lines of from the computer science streaming literature [8l [9] . The approximations provided by the 
binary matrices in Lemma [1] guarantee that taking the median of all K entires of (M(K, n) ■ x) will 
provide a good estimate of each important entry, x„. 

2.1. Efficiently Storing a (AT, [j^i^J) -coherent Mat rix. Let P,N E N, where P is prime. 
DeVore, using techniques along the lines of Kashin |21j . gives a deterministic construction of 
(P, I^^I^J ) -coherent matrices having rows and N columns in [lOJ. This construction, together 
with Bertrand's postulate, yields general (^K, [|2^J ) -coherent matrices with <m < 4K^ rows 
and columns for any given K, N £¥l. In section H] we will start to construct compressed sensing 
matrices of near-optimal size by randomly selecting a small set of rows from one of DeVore's de- 
terministic [K, [j^J ) -coherent matrices. However, before doing so we will discuss the complexity 
of storing and regenerating submatrices of DeVore's (AT, [js^J ) -coherent matrices. 

As above, let P, E N with P prime. Furthermore, let F be the finite field of order P. Every 
column of a P2 X iV (P, [^^J )-coherent matrix as constructed in jlOj has an associated polynomial 
over F of degree at most [logp A^] — 1. We will assume that these polynomials are ordered so that 
the polynomial associated j^'^-column is 



Qj{x) := Jo + jix -h j2X^ H h j[io. 



.[logp N]-l 



where jo, . . . , j[iogpAf]-i £ [P] are the digits of j £ [N] base P. That is, 

j = io + jiP + j2P' + ■■■+ iriog, ivi-i^^'°^^ 

Thus, Qj with j £ [N] can be obtained in 0(logp A^)-time by finding the representation of j base 

pE 

Let M G {0,1}^'''^ be a (P, [i|^J ) -coherent matrix as constructed in [10]. The rows of 
M are indexed by elements of [P] x [P], ordered lexicographically. Given j G [N] the ones in the 
j^'^-column of M appear in rows 

(0, Q,(0)) , (1, Q,(l)) , (2,Q,(2)) , . . . , (P - - 1)) . 

Given p £ [P] we will refer to the set of P rows of M, 

{{p,r) I r£ [P]}, 

as the p^^ block of rows. Every column of M, j £ [N], will have exactly one 1 in each such block. 
Furthermore, this 1 can be located in 0(logp A^)-time by using Horner's rule on Qj. 



3. Existence of Near Optimal (if, a) -coherent Matrices 

In this section we will use standard probabilistic arguments to demonstrate the existence of 
{K, Q:)-coherent matrices having a near-optimal number of rows. In particular, we will demonstrate 

that a randomly generated matrix having m = O ^"^^ rows will be {K, a)-coherent with high 

probability. The end result is that the methods herein can be utilized as the basis for a 0{mN)- 
time Monte Carlo algorithm for building near-optimal (if, a)-coherent matrices^ 

DeVore's construction yields (if, [js^J ) -coherent matrices having O(if^) rows, exceeding the 

lower bound, log j^j^N^, by an a = 0(log;^ A^)-factor (assuming that K ^ a). In this 

section we demonstrate the existence of (©(if), 0(ln A^))-coherent matrices having 0{K'^ /a) rows. 
These matrices exceed the lower bound by a O(logif) factor, and represent a general improvement 
over DeVore's construction with respect to row count. 

We will build M £ {0, l}™^-'^ by letting each entry, Mi^j, be an independent and identically 
distributed Bernoulli random variable that is 1 with probability p and with probability 1 — p. 
Note that the number of ones in a given column of M will be a binomial random variable in this 
case. Similarly, the inner product between any two given columns of M will also be binomial. 
Hence, we may bound both of these quantities using the Chernoff and union bounds. We have the 
following two lemmas. 

Lemma 2. Let a,p £ [0,1) and m,N,K £ N. Randomly generate a matrix, M £ {0,1}"^^^, 
each of whose entries is an i.i.d. Bernoulli random variable which is 1 with probability p. If K is 
Q(logiV) and 



(4) „p = + ,n I + , 2A- m f i^) + 



1 — aJ y \1 — fjy \1 — a ^ 

then every column of M will contain 0(if) ones with probability at least 1 — ^^i^^. 



^We assume 0(l)-time arithmetic operations (e.g., +, — , ■, /) throughout this paper. 

'''Furthermore, it is worth recalhng that these same methods also provide Las Vegas algorithms which run in 
expected 0(mA''^)-time. 
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Proof: Let Sj, j E [N], be the number of ones in column j of M. We have IE [Sj] = mp. The 
Chernoff bound now imphes that 

, 2 



P [Sj < K] = P 



K 

mp 



-mp /2 



as long as X < mp. Thus, we can bound the probability that Sj < K above by for any desired 
(7 G [0, 1) by ensuring that 



(5) P [Sj <K]<e 

Simplifying Equation [5] above, we obtain 



mp 1 



K 

mp 



> 21n 



1 -a 



Solving for mp in terms of K and A^, we learn that 

3iV 

[mpY - 2 ( + In ' 

This will hold whenever 

mp > K + \b. 



1 - a 



{mp) + > 0. 



3iV 



1-0- 



+ ^ 2K\n 



3iV 
l-a 



l-a 



> K. 



Applying Equation [5] together with the union bound over all choices of Sj yields the desired 
lower bound. A similar argument guarantees that every row will also have fewer than 



eK + e In 

ones with probability at least 1 



3iV 
l-a 



l-a 



+ e\ 2Kln 



3N 
1 - a 



+ ln^ 



3iV 
l-a 



□ 



Lemma 3. Let a,p e [0,1) and m,N,K G N. Randomly generate a matrix, M e {0,1}™^^, 
each of whose entries is an i.i.d. Bernoulli random variable which is 1 with probability p. If 

a = 2mp^ > 21og4/g ('f^') then all pairs of columns of M will have inner product at most a with 
probability at least 1 

Proof: Let /jj be the inner product of the j^^ column of M with the i^^ column of M for a given 
i,j € [A^] with i ^ j. We want Ijj < a. Since Ijj is binomial with E [hj] = mp^ the Chernoff 
bound implies that 



l-CT 

3 ■ 



P[/,,, >a] 



P 



a 



mp^ 



■IE [I, 



< 



a/mp'^ 



mp 



as long as a > mp^. For the sake of simplicity, suppose that a = 2mp^ = 21og4/g (^f^) • Then 



P[/, 



> a\ < [- 



a 



3iV2 



In this case the union bound now guarantees that our randomly constructed matrix M will also 
satisfy the second (/C, a)-coherent property with probability at least 1 — □ 
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Lemma [2] guarantees that a randomly constructed binary matrix will satisfy the first {K^ a)- 
coherent property with high probability. Similarly, Lemma [3] guarantees the second {K, a)-coherent 
property. Solving for p in light of Equation [5] and Lemma [3] we get that we can set 

, , mp2 log4/e (f^j 

(6) p — — 



mp 



^ + ln(^^)+^2Kln(^)+ln^(^ 
and 



+ ln 



mp 

[1) m 




P log,/, (f^) 

Note these equations both make sense whenever K > a> 21og4/g ^f^^ We have the following. 

Theorem 2. Fix a e [0,1). Let m,K,a G [N] be such that K > a > 21og4/e (^f^^^, and let 

m = Q {K"^ /a) as per Equation^ Randomly generate a matrix, M G {0, l}™-^^^ each of whose 
entries is an i.i.d. Bernoulli random variable which is 1 with the probability, p, given in Equation\^ 
Then, M will be both {K, a) -coherent and have {K) ones per column with probability at least a. 

Although the matrices developed above have fewer rows than DeVore's, we hasten to point out 
that they are generally less structured. This ultimately means that they will be difficult to store 
in compact form, and, therefore, of limited use when space complexity is a dominant concern. 

4. Sampling Rows from a (K, a)-coHERENT Matrix 

Given an m x matrix M and a subset s C [m] , we define Mg to be the |s| x sub matrix of M 
consisting of the rows of M contained in s. If s is explicitly specified to be a multiset as opposed 
to a set, we will (implicitly) repeat rows from M contained in s as necessary. Let 1^ denote the 
number of nonzero entries in the n*^ column of Mg. We define Ms{l, n), 1 < Z < /m to be the I x A^ 
sub matrix of Mg consisting of the first / rows in Ms with nonzero entries in the n*'* column. 

4.1. Identification Matrix. The following corollary to Lemma[T]will be used to construct matri- 
ces for the identification of the largest magnitude entries in x. Note that the corollary is essentially 
a coupon collection result (i.e., we want to collect, for each element in 5*2^*^, a "good row" satisfying 
Equation [9]) . 

Corollary 1. Suppose M is an m x N {K, a) -coherent matrix. Let G N'^, k G [eK/a], 
c G [14,00) n N, o" G [2/3,1), and x G R^. Select a subset of the rows of M, s' C [m], by 
independently choosing 

7 m f 2k/e 
(8) 7 > - • — In ' ' 



Q K VI - 

values from [m] uniformly at random with replacement. If K > c ■ (ka/e) then with probability at 
least a every n G S'^j^ C [A] will have an associated row of Mg', in G [7], for which 

e X x°''* 

(9) \{Mg, ■ x),„ -xn\< 
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Proof: Fix n G ^Q^k/e' Lemma [T] implies that each randomly selected row of M, j G [m], will satisfy 

Equation [9] with probability at least f • Hence, the probability that none of the 7 selected rows 
will satisfy Equation is at most 

7 m J 



Let X = I • ^ . If 7 satisfies Equation [8] we have that 



> X • In 



2k/e 
1-a 



This in turn implies that 
Thus, 



7 m J 2k/e 

OT oil ^ Lr" / £1 old m ci Trf c /-if *, 



whenever 7 satisfies Equation [HI Taking the union bound over all 2k /e elements of S'^'^^^ finishes 
the proof. □ 



It is straightforward to show that a random sub matrix, Mg', will have O(logA^) ones in every 
column with high probability when it is constructed as per Corollary [1] from a {K, a)-coherent 
matrix having (K) ones per column^ It is also important to note that an analogous variant of 
Corollary [Dean be proven for DeVore's {Q{K), 0(log/^ A^))-coherent matrices by randomly selecting 

O (^ln(^f^)) blocks of Q{K) rows (see Section I2.ip . Randomly selecting rows from a DeVore 

matrix in blocks both (i) guarantees that every column of the resulting sub matrix will have 

O ^In (j^^^ ones, and (ii) requires only O (in (j^^ InK^ random bits. The following theorem 

is proven via standard bit testing techniques (see, e.g., [Sill]). 

Theorem 3. Suppose M is an m x N {K, a) -coherent matrix. Let e G (0,1], cr G [2/3,1), k G 
[K ■ j^], and x G IR^. Construct Ms' as per Corollary [7J Then, with probability at least a, 
{Mgi ®Biy)x will allow Phase 1 (i.e., lines 4 through I4) of Algorithm{l\in the appendix to recover 
all n G [A^] for which 

opt 

(10) \Xn\ > 4 

k 

The required Phase 1 runtime is O f-^ln (t^) InA^ 



K \ 1- 

The only new observation required for the proof of Theorem [3] beyond those used to prove the 
analogous results in [9l [H [19] involves noting that any n satisfying Equation [TO] also belongs to 

QOpt 

To finish, we note that applying (a variant of) Corollary [1] to a (©(ivT), 0(logj^ A^))-coherent 
matrix from Section [2.11 produces a random matrix, Ms', having 



^This result follows via techniques analogous to those utilized in Section |3] in order to establish Theorem [2] (i.e., 
via the ChernofT and union bounds). 
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rows. For a fixed, this reduces to O ((A;/e) log A^) rows. Furthermore, Mgi will have 0(log(A;/e)) 
ones in all columns. Applying Corollary [1] to a (G(ii'), 0(log A^))-coherent matrix from Section [3] 
produces a random matrix, Mgi , having 



2k/e 



1 



O 



K 



\ogN 



•In 



2k/e 



1 



O ( ^ • In 



/ 2/c/e 



1 



rows. For a fixed, this reduces to O {{k/e)\og{k/e)) rows. Furthermore, Mgi will have O(logA^) 
ones in all columns with high probability. 

4.2. Estimation Matrix. The following corollary constructs measurements capable of estimating 
every entry of x that is identified as large in magnitude during Phase 1 of Algorithm [TJ Furthermore, 
the estimation procedure is simple, requiring only median operations (see Phase 2 of Algorithm [1]) . 

Corollary 2. Suppose M is an m x N {K, a) -coherent matrix. Let e^^ G N^, k G {eK/a\, 
c G [14, oo) n N, fj G [2/3, 1), S C [A^], and x G R^. Select a multiset of the rows of M , s C [m], 
by independently choosing 



/3 > 28.56 • — In 
K 



1 



values from [m] uniformly at random with replacement. If K > c ■ (ka/e) then Mg will have both 
of the following properties with probability at least a: 

(1) There will be at least / = 21 • In 

Hence, Ms{l,n) will be well defined for all n G 5. 

(2) For all n ^ S more than In/^ of the entries in Ms{lr, 
values j G [In], counted with multiplicity) will have 



2\S\ 
l-o- 



nonzero values in every column of Ms indexed by S. 

X (i.e., more than half of the 



(Mg(/„,n) 



X 



e • 



< 



opt 



k 



Proof: Fix n G 5. We select our multiset, s C [m], of the rows of M by independently choosing /3 
elements of [m] uniformly at random with replacement. Denote the j*^ element chosen for s by sj. 
Finally, let Pj" be the random variable indicating whether Mg.^n > 0, and let be the random 
variable indicating whether Sj satisfies 



(12) 



Xr, 



€ ■ 



< 



X 



X 



opt 
{k/e) 



1 



conditioned on P". Thus, 



Ql 



= 1 if Msj^n > 0, and otherwise. Similarly, 

1 if Sj satisfies Equation [12] and P" = 1 
otherwise 



Lemma [T] implies that P = 1 



pn 

j 



1 



> |. Furthermore, 



E 



/3 

E 



-^1 ! • • • ) -f^fl 




Let In = Z]j=i Pj"- The Chernoff bound (see, e.g., |23j) guarantees that 



P 



< 



4.-lr 



< e IS < e" 



21 
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Thus, if /„ > 21 we can see that X]j=i '^i^^ be less than with probability less than e 21 . 
Hence, if /„ > 211n (^1^) then Property 2 will fail to be satisfied for n with probability less than 



1-0- 

2lST' 



Focusina; now on Ifi -, WG note tlia-t IP 
2|5| 



pn 

j 



> I SO that /i = E [/„] > f /3. 



Let / = 21 In I I . Applying the Chernoff bound one additional time reveals that P 



(g ^ a) Hence, if we wish to bound P 



In < I 



< 



In < I 

^2 _ [2 > 0. Setting /3 > 1.36 • f [ = 28.56 • f ln(|l§j achieves this goal. The end 

result is that Mg will fail to satisfy both Properties 1 and 2 for any n £ S with probability less 
than ^j^- Applying the union bound over all n S S finishes the proof. □ 

Note that corollary [2] considers selecting a multiset of rows from a {K, a)-coherent matrix. Hence, 
some rows may be selected more than once. If this occurs, rows should be considered to be 
selected multiple times for counting purposes only. That is, all computations involving a row 
which is selected several times should still be carried out only once. However, the results of these 
computations should be considered with greater weight during subsequent reconstruction efforts 
(e.g., multiplely selected rows should be considered as generating multiple duplicate entries in 
Mg-x). 

As above, it is straightforward to show that a random sub matrix, Mg, will have O(logA^) 
ones in every column with high probability when it is constructed as per Corollary [2] from a (K, a)- 
coherent matrix having Q (K) ones per column. In addition, an analogous variant of Corollary [2] can 

be proven for DeVore's {@{K), 0(log^ A^))-coherent matrices by randomly selecting O (in ^y!^)) 

blocks of Q{K) rows. Randomly selecting rows from a DeVore matrix in blocks this way both 

guarantees that all columns of the resulting sub matrix will have O (in (^^^^^ ones, and also 

requires only O (in (^^^^ Ini^^ random bits. Note that we must be able to quickly construct 
arbitrary columns of in order to execute Phase 2 of Algorithm [1] in the low memory setting 
(i.e., when we can not explicitly store either the entire matrix M, or the randomly selected sub 
matrix Mg in memory). In this setting DeVore's (0(-fC), 0(log^ A^))-coherent matrices allow us to 

reconstruct any column of a random sub matrix containing O (in ^j^^^ blocks of rows, Mg, in 

just O (in ^Y^^ • log^^ A^^-time (see Section [2T] for details). 

Corollary |2] will generally be applied with S C [A^] set to the subset discovered by Phase 1 of 
Algorithm [Til Hence, we will generally have |5| equal to the number of rows in a matrix Mgi 
constructed via Corollary [TJ In more extreme settings, where we want to be able to estimate all 
entries of x with high probability, we will set S = [N]. Corollary [2] implies the following theorem. 

Theorem 4. Suppose M is an m x N {K, a) -coherent matrix. Let e G (0,1], c £ [2/3,1), k G 
■ j^], and x G R^. Construct Mg as per Corollary Then, with probability at least a, Mgx 
will allow Phase 2 (i.e., lines 15 through 19) of AlgorithmU\in the appendix to estimate all x„ with 
n £ S with a Zn satisfying 



from above by 21;^ it suffices to have 



e • 

(13) \Zn - Xn\< 



opt 



k 



^In fact, we select the rows for Ms independently of the subset, S, found during Phase 1 of Algorithm [U before 
the subset has been identified. Note that we only require an upper bound on the size of S before selecting rows from 
M for our estimation matrix. Such an upper bound is supplied in advance by Corollary [T] 
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The required Phase 2 runtime (and memory complexity) is O y\S\ In ^yl^j • log^ j when M is 

a (@{K), Q(logj^ N))-coherent DeVore matrix. Phase 2 requires O {\S\ -log N) -time if M is a 
{Q {K), Q (log N))- coherent matrix from Section\^ 

Proof: Equation [T3] follows from the second property of Ms guaranteed by Corollary [21 Lines 15 
through 17 can be accomplished in O ^jS"! • In (^y^^ • log/^ A^^-time using a median-of-medians al- 
gorithm when M is a (0(-fC), 0(log^ A^))-coherent DeVore matrix. When M is a (0(-fC), 0(log N))- 
coherent matrix from Section [3l lines 15 through 17 can be accomplished in O {\S\ ■ log A^)-timeo 
Lines 18 and 19 can always be accomplished in OdS"! log ISD-time. □ 

We conclude this section by noting that applying (a variant of) Corollary[2]to a {Q{K), 0(log^ -^))- 
coherent matrix from Section [2.11 produces a random matrix, Mg, having 



l-aj J \e \l-a 

rows. Applying Corollary [2] to a (©(i^T), G(log A^))-coherent matrix from Section [3] produces a 
random matrix, Ms, having 



K \l-aJJ \logN \l-aJJ \e \l - a 

rows. For a fixed, this reduces to O {{k/ e)\og{\S\)) rows. Furthermore, Mg will have O(logA^) 
ones in all columns with high probability. 

5. Main Results 

We may now prove the three new results mentioned in Section 11.11 We have the following 
theorem. 

Theorem 5. Let e G (0,1], a £ [2/3,1), x G E,^, and k G [A^]E!I With probability at least a 
AlgorithmUl will output a vector z G satisfying 



22e 

opt _|_ 



opt 



(14) l|x-z||, < ... ,^ , ^ 

when executed using any of the following identification and estimation matrices: 

(1) A (Q {k log N / e),Q {log N))- coherent matrix from Section\^ used for estimation via Corol- 
larylMwith S = [N]. Only Phase 2 of AlgorithmUlneed be applied (i.e., no identification will 

be performed). The resulting number of measurements is O ^| • In ^j^^^ . The required 

runtime is 0{N log N). 

(2) A ^6(| logf./^ N), Qilogj.^^ N)^ -coherent matrix from Section \2. 1\ used for both identification 
(via CorollaryU\variant) and estimation (via Corollary\^variant with \S\ = In (A^/(l — o"))) ). 
The resulting number of measurements is O (^j ■ In (j^) In A^'j . The required runtime is 



"'^'-'However, using the matrixes from Section [3] requires 0(A'^ log A'')-memory since their columns contain ones in 
random locations that must be remembered. 

"'^^For the sake of simplicity, we assume k = f2(logA'') when stating the measurement and runtime bounds below. 
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(3) A {Q{klogN/e),Q{\og N))-coherent matrix from Sectionl3\used for identification (via Corol- 
lary\^, and a ( log^/^ A^), ©(log;-/^ A^) j -co/iereni matrix from Section \2.1\ used for esti- 



mation (via Corollary Invariant with \S\ = 0(-ln(A;/e(l — cr)))J- The resulting number of 
measurements is O (j - In (j^^ In . The required runtime is O • In (^j^^ ( 

Proof: The runtime and measurement bounds follow from Theorem [3l Theorem [H and the subse- 
quent Section 0] discussions. The error guarantee for z follows from Theorem [31 Theorem [H and 
the proof of Theorem 7 in p!9] . □ 

It is interesting to consider the possibility of improving the runtime bounds obtained in Theo- 
rem [5] by using iterative recovery techniques akin to those employed in [16]. This appears to be 
difficult. In particular, such iterative recovery methods generally require the contributions of par- 
tial solutions to be subtracted from the input measurements of the original vector, x, after each of 
0(logA;) rounds. Assuming that one must subtract some partial solution containing at least ^}{k) 
nonzero entries from a large (constant fraction) of the initial measurements of x at some point 
during reconstruction, it becomes clear that updating our measurements will not be 0{klogN)- 
time unless our measurement matrix contains 0(log A'^) nonzero entries per column. Unfortunately, 
fast nonadaptive identification of previously undiscovered heavy elements of x (e.g., via bit-testing 
methods) requires the use of matrices having r2(logA'^) nonzero entries in many columns during 
each new round of iterative approximation. Hence, it appears as if only 0(1) rounds of identifi- 
cation may be performed using the techniques considered herein before the required measurement 
matrices have too many ones per column in order to allow 0(fc log A^)-time recovery. The author 
considers this as (a weak) justification for utilizing only one round of identification in Algorithm [TJ 



6. Conclusion 

In this paper we present a compressed sensing recovery algorithm with an "£2, ^i" error guarantee 
that runs in only O ((A; log A;) log A^)-time. This runtime is within a 0(logA;) factor of the known 
lower Q(klogN) runtime bound. Demonstrating (or refuting) the existence of a 0(A; log A^) -time 
(i.e., linear-time in its required input size) compressed sensing recovery algorithm with similar error 
guarantees remains an open problem. 
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Appendix A. The Recovery Algorithm 

See Algorithm [1] below. 
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Algorithm 1 Approximate x 



Input: Ms and M^x for estimation, and {Ms' ® ;Biv)x for identification 

Output: z, an approximation to x^°''* 

Initialize multiset 5^0, z^OgE,^, b^OG R^ioga m 

Phase 1: Identify All Heavy n g [0, A) n N 

for j from 1 to \s'\ do 



for i from 1 to [log2 A] do 



M,/®(^jv)i+ix) 
^ 1 



> 



i+1 



X 



then 



if 

else 

end if 
end for 

5 ^ 5u{n} 
end for 

Phase 2: Estimate xg x^°'^* Using Equation [3] 
for each n value belonging to S do 

Zn <— median of multiset | {Ms{ln, n) ■ x)^ | 1 < /i < /n} 
end for 

Sort nonzero z entries by magnitud-G so that l-s^^-j^l ^ I'^n2l — I'^'n-sl ^ • • • 
S ^ {ni,n2, . . . ,n2fc} 
Output: zg 
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